Mathematical Programming in Machine Learning and Data Mining

نویسندگان

  • Katya Scheinberg
  • Jiming Peng
  • Tamas Terlaky
چکیده

The field of Machine Learning (ML) and Data Mining (DM) is focused around the following problem: Given a data domainD we want to approximate an unknown function y(x) on the given data setX ⊂ D (for which the values of y(x) may or may not be known) by a function f from a given class F so that the approximation generalizes in the best possible way on all of the (unseen) data x ∈ D. The approximating function f might take real values, as in the case of regression; binary values, as in the case of classification; or integer values, as in some cases of ranking; or this function might be a mapping between ordered subsets of data points and ordered subsets of real, integer or binary values, as in the case of structured object prediction. The quality of approximation by f can be measured by various objective functions. For instance in the case of support vector machine (SVM)[4] classification the quality of the approximating function is estimated by a weighted sum of a regularization term h(f) and the hinge loss term ∑ x∈X max{1−y(x)f(x), 0}. Hence, many of the machine learning problems can be posed as an optimization problem where optimization is performed over a given class F for a chosen objective. The connection between optimization and machine learning (although always present) became especially evident with the popularity of the SVMs [4], [24], and the kernel methods in general [18]. SVM classification problem is formulated as a convex quadratic program.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک مدل بهینه‌سازی ریاضی چند‌هدفه برای طبقه‌بندی

In this paper we investigate the issues of data classification (as one of the branches of data mining science) in form of multi-objective mathematical programming model. The model that we present and investigate is a MODM problem. First time, based on support vector machine (SVM) idea (To maximize the margin of two groups), a multi-criteria mathematical programming model was proposed for data m...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Forecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data

Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007